Add differential testing harness #216

LegNeato · 2025-02-04T04:35:02Z

The test harness compiles and runs Rust
shaders/kernels and compares the output with those produced by equivalent WGSL
shaders/kernels. Unlike traditional reference, golden, or snapshot tests—which check
against static outputs—differential testing compares multiple independent implementations of the same logic.
This approach is more robust because discrepancies between them are more likely to
indicate real bugs rather than issues with outdated reference files.

You can run difftests via cargo difftest. This is an alias set up in .cargo/config
for cargo run --release -p difftest --. You can filter to run specific tests by
passing the (partial) filenames to cargo difftest some_file_name.

LegNeato · 2025-02-10T22:23:09Z

Perhaps better DX is to have directories and just cargo run --release in each?

LegNeato · 2025-02-21T22:02:36Z

@Firestar99 FYI, I am going to revamp this a bit as it is a bit of a mess.

LegNeato · 2025-03-16T01:56:46Z

FWIW, I am almost ready to update this. It is much cleaner now :-)

LegNeato · 2025-03-17T01:23:39Z

I think this is ready for review. Sorry it is so big! I can break it down if need be.

The main changes:

Move compiletests under tests/ and add difftests there as well.
Add a difftest harness
Add a difftest sdk/library used by the binaries under test.
Add a simple compute difftest test using wgpu (one variant written in rust, one in wgsl).
Add to CI (on linux install software GPU so the tests will run)
Some docs in a README.

LegNeato · 2025-03-17T12:34:39Z

This is what a run with passing and failing tests looks like:

On failing tests, the outputs are grouped as you may have 4 variants where 3 are the same and 1 is different. In that case the 3 same will be grouped in the output so it is easier to understand what is going on.

LegNeato · 2025-03-28T16:13:34Z

Another option we could do rather than binaries, is sort of what I did before (have built in harnesses which are configured) but mirror wgpu's harness and config format. Perhaps that would allow us to leverage their test corpus by just writing the rust-gpu versions of their test inputs.

If I get my naga rust-gpu backend working, we could generate the rust-gpu code on the fly from the source wgpu test 🤔

LegNeato · 2025-04-02T00:55:24Z

This doubles CI time. I am sure there are things I can do to make CI faster (such as the deps_helper in compile test) but perhaps we should do that in followups?

Firestar99 · 2025-06-04T10:11:35Z

I pulled out the tests/* to tests/compiletests/* rename as a separate commit and rebased it for you.

In your previous rebase you forgot to move some of the new tests. I would like to fast-track that rename in #270 so it doesn't happen again.

Firestar99 · 2025-06-04T10:52:55Z

I really like the concept and I love that we can diff rust-gpu, wgsl and rust on cpu or whatever else we can think of.

I've played around with it a bit and noticed a few things:

I assume all difftests are in a different workspace to not pollute our root workspace and slow down cargo due to evaluating the potentially hundreds of cargo projects in difftests? Would love to see a sentence about it in the readme, there it just states it's done this way but not why it is.
One crate is called difftest and the other difftests, confusing!
compile times
- FIXED: The slowest part of our project is compiling the c++ of spirv_tools, which we are building twice? Once to build difftest itself, run it and it building the tests builds it again? I assume we'd only need to compile it once, as we're only compiling rust shaders once as well.
- Due to it being a different workspace, we can't share the spirv_tools or wgpu build with the main workspace. I tried setting the target dir to the same dir, didn't help, though the next issue may be affecting my results.
- FIXED: Repeated runs still take 60s to run due to it recompiling something from scratch every time, as I see many rustc spinning up. This needs to be fixed if we ever want to iterate on bugs using difftests.
ci
- ci is not checking whether the new Cargo.lock is up-to-date with cargo fetch --locked
- Maybe we should split out difftests into a separate runner to speed up builds? Maybe do the same with compiletests? That I'd make a followup to allow for more experimentation.
- FIXED: the difftests are not using feature use-installed-tools due to being build by difftests and not directly from ci

(continually adding to this)

LegNeato · 2025-06-04T13:22:29Z

I assume all difftests are in a different workspace to not pollute our root workspace and slow down cargo due to evaluating the potentially hundreds of cargo projects in difftests? Would love to see a sentence about it in the readme, there it just states it's done this way but not why it is.

Yep, added text to that effect.

One crate is called difftest and the other difftests, confusing!

Yeah, I tried to match compiletest (they are called compiletests for the runner and https://crates.io/crates/compiletest_rs introduces a compiletest crate used by the runnner). Happy to change naming to make it less confusing.

Maybe we should split out difftests into a separate runner to speed up builds? Maybe do the same with compiletests? That I'd make a followup to allow for more experimentation.

Yes, I think we should. I was planning to do so in a followup. I wanted to keep this close to "as-is" so others can see where I just mirrored the existing comptiletest support.

Firestar99 · 2025-06-04T13:26:54Z

I'm working on the repeated runs, and noticed that there's some target_dir stomping going on. This is a clean run, and still the spirv-builder cargo run is marking crates that are supposedly clean as dirty:

     Running `tests/target/release/simple-compute-rust /tmp/.tmphUE36K`
   Compiling compiler_builtins v0.1.138
   Compiling core v0.0.0 (/home/firestar99/.rustup/toolchains/nightly-2024-11-22-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core)
       Dirty proc-macro2 v1.0.94: the rustflags changed
   Compiling proc-macro2 v1.0.94
   Compiling libm v0.2.11
       Dirty unicode-ident v1.0.18: the rustflags changed
   Compiling unicode-ident v1.0.18
       Dirty autocfg v1.4.0: the rustflags changed
   Compiling autocfg v1.4.0
   Compiling spirv-std-types v0.9.0 (/home/firestar99/workspace/frameworks/rust-gpu/crates/spirv-std/shared)

run3_clean.log

LegNeato · 2025-06-04T13:28:41Z

We could provide pre-made binary crates that one can just drop files in for simple cases. I'd still want to keep the top-level "run cargo run --release" API for ultimate flexibility, but maybe we move some straightforward "stock" runners into the runner binary itself rather than the way I did it where they are a library used by the test binaries and everything is cargo run --release. Not sure which way is better 🤷

This runs wgsl shaders and rust shaders and compares the output. If the output differs, the test fails. Differential testing is better than snapshot testing or golden file testing as there are no reference files to get outdated.

Firestar99 · 2025-06-04T14:29:57Z

The stomping is happening cause spirv-builder can only find it's target_dir of target/spirv-builder when it has the "OUT_DIR" env var, which is only present if a build script is present. Otherwise, it defaults to not setting it and stomping with the cpu compile.
We have an overwrite for the target_dir directory, but it only works if spirv-builder found the target dir and then it makes it relative to that. If you were to pass in an absolute path, it would just use that, ignoring the resolved value, but still requiring that it could be resolved.
I fixed it by making spirv-builder "accept absolute target_dir_path and properly resolve relative ones with cargo_metadata", which does introduce another dependency, though we have cargo_metadata in cargo-gpu already.
From 60s of stomping we're now down to about 2s of cpu compile and 2s shader compile incrementally.

…with `cargo_metadata`

Firestar99 · 2025-06-04T14:54:29Z

I have one concern about the design of this system: If we look at the main functions of simple_compute_rust and *_wgsl, they are effectively the same, apart from the shader they use. With the current system, we'd need to ensure that the wgsl and rust versions dispatch the shader with the same hypercube, it could as well be the same main binary with a flag to select which shader to use. While copy-pasting those few lines may not be a huge deal, if we start to copy-paste a whole setup for a graphics pipeline, that's an issue.
Alternatively, we could have one crate depend on the other, but then the one crate's main would just be calling a single function by supplying it a different shader. Seems silly to have a crate just for that.
Instead, I'd propose to have a single crate per "test case" containing both rust and wgsl shaders. The amount of variants would have to be discovered by running the binary without any args, and with args it'll run that variant (just an enum?). The individual runs are still isolated from each other by rerunning the binary each time, while simplifying code sharing a lot. And if you wanted to add a CPU variant to the current system, you could just branch off the cpu variant before you get to the gpu variants.
You could even argue that every test could just be a binary within a single global crate.

LegNeato · 2025-06-04T15:02:01Z

I assumed any code being shared by more than 2 tests would be put into the difftest crate for reuse. The reason I did it this was was for maximum control (let's say you wanted different deps or different rustflags, of even different entries in cargo.toml). But it could definitely be collapsed into one. I'm indifferent, I just wanted to make sure we could test MxN scenarios

Firestar99

I'd be fine merging it as is, using it for a bit and see how it feels like. We can still refactor it later and adjust all our existing difftests as need arises.

Also tested my spirv-builder patch with cargo-gpu, results in the exact same directory structure as before. I should mention though that I changed the type of target_dir from String to PathBuf, but I'd wager noone apart from me is using that setting anyway.

LegNeato force-pushed the difftests branch from 691fff8 to 8ed65f2 Compare February 4, 2025 04:35

LegNeato marked this pull request as ready for review February 6, 2025 23:29

LegNeato requested review from eddyb, Firestar99 and schell as code owners February 6, 2025 23:29

This comment was marked as outdated.

Sign in to view

LegNeato mentioned this pull request Feb 10, 2025

add support for leading_zeros, trailing_zeros and fix count_ones #213

Merged

LegNeato marked this pull request as draft March 13, 2025 15:37

LegNeato force-pushed the difftests branch 2 times, most recently from 2b484a8 to 4408a6e Compare March 17, 2025 01:17

LegNeato marked this pull request as ready for review March 17, 2025 01:21

LegNeato enabled auto-merge March 17, 2025 01:55

LegNeato force-pushed the difftests branch from 3f05dc2 to a9bc9ed Compare March 17, 2025 02:19

LegNeato force-pushed the difftests branch from a9bc9ed to ac2a383 Compare March 17, 2025 14:55

LegNeato force-pushed the difftests branch from ac2a383 to b9c5840 Compare April 1, 2025 01:44

LegNeato force-pushed the difftests branch from b9c5840 to 8089f31 Compare April 21, 2025 02:47

LegNeato force-pushed the difftests branch 2 times, most recently from fe29d54 to 496a00d Compare May 22, 2025 22:21

Firestar99 force-pushed the difftests branch from de7016b to ee99db4 Compare June 4, 2025 10:08

Firestar99 force-pushed the difftests branch from ee99db4 to a7d2ec1 Compare June 4, 2025 10:16

Firestar99 force-pushed the difftests branch from 1ecc3cd to 330262f Compare June 4, 2025 11:38

LegNeato and others added 6 commits June 4, 2025 16:24

Add differential testing harness

0db3d19

This runs wgsl shaders and rust shaders and compares the output. If the output differs, the test fails. Differential testing is better than snapshot testing or golden file testing as there are no reference files to get outdated.

Fix warnings

1e5b687

Update lockfile

2ee44f0

Prebuild all tests

a3236b5

remove unused crate dependencies

f373e60

difftest: forward use-installed-tools

eff8c1c

Firestar99 force-pushed the difftests branch from fff1cd3 to 2e76029 Compare June 4, 2025 14:26

Firestar99 and others added 2 commits June 4, 2025 16:34

accept absolute target_dir_path and properly resolve relative ones …

49da797

…with `cargo_metadata`

Add explanation for why different workspace

1eabbf4

Firestar99 force-pushed the difftests branch from 2e76029 to 1eabbf4 Compare June 4, 2025 14:34

Firestar99 approved these changes Jun 4, 2025

View reviewed changes

LegNeato added this pull request to the merge queue Jun 4, 2025

Merged via the queue into Rust-GPU:main with commit 63f98bb Jun 4, 2025
7 checks passed

LegNeato deleted the difftests branch June 4, 2025 15:52

Add differential testing harness #216

Add differential testing harness #216

Uh oh!

Conversation

LegNeato commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

LegNeato commented Feb 10, 2025

Uh oh!

LegNeato commented Feb 21, 2025

Uh oh!

LegNeato commented Mar 16, 2025

Uh oh!

LegNeato commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LegNeato commented Mar 17, 2025

Uh oh!

LegNeato commented Mar 28, 2025

Uh oh!

LegNeato commented Apr 2, 2025

Uh oh!

Firestar99 commented Jun 4, 2025

Uh oh!

Firestar99 commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LegNeato commented Jun 4, 2025

Uh oh!

Firestar99 commented Jun 4, 2025

Uh oh!

LegNeato commented Jun 4, 2025

Uh oh!

Firestar99 commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Firestar99 commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LegNeato commented Jun 4, 2025

Uh oh!

Firestar99 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LegNeato commented Feb 4, 2025 •

edited

Loading

LegNeato commented Mar 17, 2025 •

edited

Loading

Firestar99 commented Jun 4, 2025 •

edited

Loading

Firestar99 commented Jun 4, 2025 •

edited

Loading

Firestar99 commented Jun 4, 2025 •

edited

Loading